76
Algorithms for Binary Neural Networks
TABLE 3.4
With different λ and θ, we evaluate the accuracies of BONNs
based on WRN-22 and WRN-40 on CIFAR-10/100. When
varying λ, the Bayesian feature loss is not used (θ = 0).
However, when varying θ, we choose the optimal loss weight
(λ = 1e −4) for the Bayesian kernel loss.
Hyper-param.
WRN-22 (BONN)
WRN-40 (BONN)
CIFAR-10
CIFAR-100
CIFAR-10
CIFAR-100
λ
1e −3
85.82
59.32
85.79
58.84
1e −4
86.23
59.77
87.12
60.32
1e −5
85.74
57.73
86.22
59.93
0
84.97
55.38
84.61
56.03
θ
1e −2
87.34
60.31
87.23
60.83
1e −3
86.49
60.37
87.18
61.25
1e −4
86.27
60.91
87.41
61.03
0
86.23
59.77
87.12
60.32
3.7.7
Ablation Study
Hyper-Parameter Selection In this section, we evaluate the effects of hyperparameters
on BONN performance, including λ and θ. The Bayesian kernel loss and the Bayesian
feature loss are balanced by λ and θ, respectively, to adjust the distributions of kernels and
features in a better form. WRN-22 and WRN-40 are used. The implementation details are
given below.
As shown in Table 3.4, we first vary λ and set θ to zero to validate the influence of
Bayesian kernel loss on kernel distribution. The utilization of Bayesian kernel loss effectively
improves the accuracy on CIFAR-10. However, the accuracy does not increase with λ,
indicating we need not a larger λ but a proper λ to reasonably balance the relationship
between the cross-entropy and the Bayesian kernel loss. For example, when λ is set to
1e −4, we obtain an optimal balance and the best classification accuracy.
The hyperparameter θ dominates the intraclass variations of the features, and the effect
of the Bayesian feature loss on the features is also investigated by changing θ. The results
illustrate that the classification accuracy varies similarly to λ, verifying that Bayesian feature
loss can lead to a better classification accuracy when a proper θ is chosen.
We also evaluate the convergence performance of our method over its comparative coun-
terparts in terms of ResNet-18 on ImageNet ILSVRC12. As plotted in Fig. 3.22, the XNOR-
Net training curve oscillates vigorously, which is suspected to be triggered by a suboptimal
learning process. On the contrary, our BONN achieves better training and test accuracy.
Effectiveness of Bayesian Binarization on ImageNet ILSVRC12 We experimented
by examining how each loss affects performance better to understand Bayesian losses on
the large-scale ImageNet ILSVRC12 dataset. Based on the experiments described earlier, if
used, we set λ to 1e −4 and θ to 1e −3. As shown in Table 3.5, both the Bayesian kernel
loss and Bayesian feature loss can independently improve the accuracy on ImageNet. When
applied together, the Top-1 accuracy reaches the highest value of 59.3%. As shown in Fig.
3.21, we visualize the feature maps across the ResNet-18 model on the ImageNet dataset.
They indicate that our method can extract essential features for accurate classification.
TABLE 3.5
Effect of Bayesian losses on the ImageNet data
set. The backbone is ResNet-18.
Bayesian kernel loss
Bayesian feature loss
Accuracy
Top-1
56.3
58.3
58.4
59.3
Top-5
79.8
80.8
80.8
81.6